Search CORE

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

Author: AE Urban
D Pinkel
DA Wheeler
DR Bentley
DR Zerbino
F Sanger
GH Perry
J Butler
J Rozowsky
JC Dohm
JC Venter
Jiang Du
JO Korbel
JO Korbel
JY Hehir-Kwa
M Margulies
M Pop
M Pop
Mark B. Gerstein
Michael Snyder
MJ Chaisson
PA Pevzner
R Lippert
R Redon
R Schmid
RL Warren
Robert D. Bjornson
RR Selzer
S Batzoglou
S Levy
SMD Goldberg
V Bansal
William Stafford Noble
Yong Kong
Zhengdong D. Zhang
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost

On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

Author: A Bashir
AA Hoffmann
AJ Iafrate
AM Hillmer
AW Pang
B Zeitouni
C Alkan
CB Krimbas
DC Richter
E Tuzun
F Hormozdiari
F Hormozdiari
H Li
H Stefansson
J Cao
J Sebat
J Wang
JC Roach
JM Kidd
JM Kidd
JO Korbel
JO Korbel
José Ignacio Lucas Lledó
K Chen
KF Manly
KJ McKernan
L Feuk
M Onishi-Seebacher
Mario Cáceres
P Medvedev
PJ Campbell
PJ Stephens
R Xi
S Suzuki
SM Ahn
SS Sindi
T Rausch
Y Jiang
ZD Zhang
Zhanjiang Liu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions -SVDetect, GRIAL, and VariationHunter-, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Diposit Digital de Documents de la UAB

Systematic Inference of Copy-Number Genotypes from Personal Genome Sequencing Data Reveals Extensive Olfactory Receptor Gene Content Diversity

Author: A Keller
Adrian M. Stütz
AE Urban
AJ Iafrate
AJ Sharp
Andreas Schlattl
C Alkan
C Xie
DF Conrad
Doron Lancet
DR Bentley
DY Chiang
F Zhang
H Li
H Li
HY Lam
I Menashe
Ifat Keydar
J Rozowsky
J Sebat
Jan O. Korbel
JDP Wysocki CJ Jr
JM Kidd
JM Kidd
JM Young
JO Korbel
JO Korbel
K Chen
K Ye
KA Frazer
L Feuk
M Nozawa
Miriam Khen
PJ Campbell
R Li
S Lee
S Yoon
SA McCarroll
Sebastian M. Waszak
T Newman
Thomas Zichner
Tsviya Olender
Wyeth W. Wasserman
X She
Y Hasin
Yehudit Hasin
Publication venue: Public Library of Science
Publication date: 11/11/2010
Field of study

Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing

Infoscience - École polytechnique fédérale de Lausanne

Extensive Copy-Number Variation of Young Genes across Stickleback Populations

Author: A Abyzov
A Alexa
A Conesa
A Hussain
AJ Iafrate
AJ Sharp
AJ Vilella
AR Boyko
AR Quinlan
B Guo
BE Deagle
C Eizaguirre
C Eizaguirre
Christophe Eizaguirre
CL McGrath
CL Peichel
D Bryant
D Juan
D Tautz
DE Cook
DH Huson
DJ Turner
DR Schrider
DR Schrider
DR Zerbino
E Gazave
E Proux
Erich Bornberg-Bauer
FA Kondrashov
FC Jones
Frédéric J. J. Chain
G Gibson
G Orti
GC Conant
GH Perry
GH Perry
GM Cooper
H Kehrer-Sawatzki
H Li
Irene E. Samonte
J Sebat
JA Fawcett
Jianzhi Zhang
JJ Emerson
JK Colbourne
JO Korbel
JO Korbel
K Chen
K Khalturin
K Ye
KJ Lipinski
KJ Livak
KM Teshima
KM Wegner
L Xu
LC Hsing
LR Saraiva
M Hiraiwa
M Long
M Long
M Lynch
M Lynch
M Milinski
M Roesti
MA DePristo
Mahesh Panchal
Manfred Milinski
Martin Kalbe
Monika Stoll
N Ghanem
P Danecek
P Flicek
P Sjödin
PA Hohenlohe
PGD Feulner
PH Sudmant
Philine G. D. Feulner
PM Kim
R Redon
RC Iskow
S Moretti
S Sawyer
SF Altschul
SH Williamson
SM Waszak
SR Browning
T Marques-Bonet
T Rausch
TD Schmittgen
Thorsten B. H. Reusch
Tobias L. Lenz
V Guryev
V Katju
V Katju
V Ranwez
X Huang
Y Hashiguchi
Y Hashiguchi
Y Zheng
YE Zhang
YF Chan
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

OceanRep

Queen Mary Research Online

Bern Open Repository and Information System (BORIS)

MPG.PuRe

FigShare

A Hidden Markov Model for Copy Number Variant prediction from whole genome resequencing data

Author: A McKenna
B Langmead
C Alkan
C Xie
DR Bentley
ES Lander
F Hach
H Li
H Li
Itsik Pe’er
J Wang
JO Korbel
K Chen
P Medvedev
R Durbin
R Li
S Lee
S Sarin
S Yoon
Y Shen
Yiwei Gu
Yufeng Shen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Motivation: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. Results: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. Availability: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel

Springer - Publisher Connector

Columbia University Academic Commons

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Author: A Kasprzyk
C Minguillon
DA Natale
DL Wheeler
E Birney
F Al-Shahrour
F Chen
GP Wagner
H Li
Jacob de Vlieg
JF Dufayard
JO Korbel
K Reichard
M Ashburner
Peter MA Groenen
PS Dehal
R Fredriksson
RC Edgar
S Guindon
T Hulsen
TA Eyre
Tim Hulsen
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release

Springer - Publisher Connector

Radboud Repository

Mycobacterial Heparin-binding Hemagglutinin Antigen Activates Inflammatory Responses through PI3-K/Akt, NF-κB, and MAPK Pathways

Author: A-Rum Shin
Alemán
Baeuerle
Bermudez
Brightbill
Bulut
Chul-Su Yang
Darieva
Delogu
Eun-Kyeong Jo
Ghosh
Guha
Hawes
Hougardy
Hwa-Jung Kim
Jeong-Kyu Park
Jo
Jung
Jung
Jung
Ki-Hye Kim
Korbel
Lee
Lee
Lee
Locht
Maiti
Menozzi
Menozzi
Parra
Pathak
Pethe
Place
Rajaram
Roach
Schorey
Sendide
Shin
Sly
So-Ra Jeon
Song
Temmerman
Thoma-Uszynski
Toossi
Vanhaesebroeck
Wallis
Wang
Weir
Yadav
Yang
Yang
Zanetti
Zhang
Publication venue: The Korean Association of Immunologists
Publication date: 01/01/2011
Field of study

Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity

LSHTM Research Online